Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

نویسندگان

  • Chengjie Qin
  • Martin Torres
  • Florin Rusu
چکیده

Existing data analytics systems have approached predictive model training exclusively from a data-parallel perspective. Data examples are partitioned to multiple workers and training is executed concurrently over different partitions, under various synchronization policies that emphasize speedup or convergence. Since models with millions and even billions of features become increasingly common nowadays, model management becomes an equally important task for effective training. In this paper, we present a general framework for parallelizing stochastic optimization algorithms over massive models that cannot fit in memory. We extend the lockfree HOGWILD!-family of algorithms to disk-resident models by vertically partitioning the model offline and asynchronously updating the resulting partitions online. Unlike HOGWILD!, concurrent requests to the common model are minimized by a preemptive push-based sharing mechanism that reduces the number of disk accesses. Experimental results on real and synthetic datasets show that the proposed framework achieves improved convergence over HOGWILD! and is the only solution scalable to massive models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such ...

متن کامل

Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms

Stochastic Gradient Descent (SGD) is the standard numerical method used to solve the core optimization problem for the vast majority of machine learning (ML) algorithms. In the context of large scale learning, as utilized by many Big Data applications, efficient parallelization of SGD is in the focus of active research. Recently, we were able to show that the asynchronous communication paradigm...

متن کامل

Distributed asynchronous optimization of convolutional neural networks

Recently, deep Convolutional Neural Networks have been shown to outperform Deep Neural Networks for acoustic modelling, producing state-of-the-art accuracy in speech recognition tasks. Convolutional models provide increased model robustness through the usage of pooling invariance and weight sharing across spectrum and time. However, training convolutional models is a very computationally expens...

متن کامل

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...

متن کامل

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

We introduce and analyze stochastic optimization methods where the input to each gradient updateis perturbed by bounded noise. We show that this framework forms the basis of a unified approachto analyze asynchronous implementations of stochastic optimization algorithms. In this framework,asynchronous stochastic optimization algorithms can be thought of as serial methods operatin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017